research plan
Idea2Plan: Exploring AI-Powered Research Planning
Huang, Jin, Cucerzan, Silviu, Jauhar, Sujay Kumar, White, Ryen W.
Large language models (LLMs) have demonstrated significant potential to accelerate scientific discovery as valuable tools for analyzing data, generating hypotheses, and supporting innovative approaches in various scientific fields. In this work, we investigate how LLMs can handle the transition from conceptual research ideas to well-structured research plans. Effective research planning not only supports scientists in advancing their research but also represents a crucial capability for the development of autonomous research agents. Despite its importance, the field lacks a systematic understanding of LLMs' research planning capability. To rigorously measure this capability, we introduce the Idea2Plan task and Idea2Plan Bench, a benchmark built from 200 ICML 2025 Spotlight and Oral papers released after major LLM training cutoffs. Each benchmark instance includes a research idea and a grading rubric capturing the key components of valid plans. We further propose Idea2Plan JudgeEval, a complementary benchmark to assess the reliability of LLM-based judges against expert annotations. Experimental results show that GPT-5 and GPT-5-mini achieve the strongest performance on the benchmark, though substantial headroom remains for future improvement. Our study provides new insights into LLMs' capability for research planning and lay the groundwork for future progress.
- Europe > Austria > Vienna (0.14)
- Asia > Singapore (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (9 more...)
- Information Technology > Security & Privacy (1.00)
- Education (0.67)
LLMs for Bayesian Optimization in Scientific Domains: Are We There Yet?
Gupta, Rushil, Hartford, Jason, Liu, Bang
Large language models (LLMs) have recently been proposed as general-purpose agents for experimental design, with claims that they can perform in-context experimental design. We evaluate this hypothesis using both open- and closed-source instruction-tuned LLMs applied to genetic perturbation and molecular property discovery tasks. We find that LLM-based agents show no sensitivity to experimental feedback: replacing true outcomes with randomly permuted labels has no impact on performance. Across benchmarks, classical methods such as linear bandits and Gaussian process optimization consistently outperform LLM agents. We further propose a simple hybrid method, LLM-guided Nearest Neighbour (LLMNN) sampling, that combines LLM prior knowledge with nearest-neighbor sampling to guide the design of experiments. LLMNN achieves competitive or superior performance across domains without requiring significant in-context adaptation. These results suggest that current open- and closed-source LLMs do not perform in-context experimental design in practice and highlight the need for hybrid frameworks that decouple prior-based reasoning from batch acquisition with updated posteriors.
How to use Google Deep Research to save hours of time
Google hasn't been shy in pushing out new AI tools and features in recent months, from Gemini in Gmail to AI-hosted podcasts. One of the latest innovations unveiled is Google Deep Research, which essentially lets Google's Gemini AI loose on the web, with a mission to thoroughly research a topic of your choice. Imagine there's something you need to do that would normally require a lot of Googling: It could be finding the best phone to upgrade to, for example, or trying to understand how a self-driving car is put together, or charting out the history of Scotland in the 17th century. Deep Research can take on any kind of challenge like this. It's what's known as an "agentic feature"--a trending term in AI that basically means these bots get more agency and control over what they're doing.
Postdoctoral Fellow in Artificial Intelligence and Law, Sweden
Postdoctoral positions are appointed primarily for purposes of research. Applicants are expected to hold a Swedish doctoral degree or an equivalent degree from another country. In the first instance, a person who has completed a Swedish doctoral degree in Law with a focus on IT Law and AI or has a foreign degree that is deemed to correspond to this or has achieved equivalent scientific competence is sought, no more than three years before the application deadline. If there are special circumstances, the doctoral degree may have been completed earlier. Such reasons are leave due to illness, parental leave, clinical service, elected positions within trade unions or other similar circumstances.
Tenure Track position as Assistant Professor in Machine Learning
Umeå University welcomes applications for a tenure track position as Assistant Professor in Machine Learning. The position, which is established through the Wallenberg AI, Autonomous Systems and Software Program (WASP, http://wasp-sweden.org/), Last day to apply is 2019-10-14. Subject description Machine learning (ML) is the discipline concerned with computer software that can learn autonomously. ML, including both neural network-based approaches and mathematical statistics-based approaches, has become a driving force behind many recent breakthroughs in artificial intelligence, and is used in widely different areas like speech recognition, image analysis, natural language understanding, machine translation, question and answering systems, protein folding, and even playing GO.
Tenure Track position as Assistant Professor in Machine Learning
Umeå University welcomes applications for a tenure track position as Assistant Professor in Machine Learning. The position, which is established through the Wallenberg AI, Autonomous Systems and Software Program (WASP, http://wasp-sweden.org/), Last day to apply is 2019-10-14. Subject description Machine learning (ML) is the discipline concerned with computer software that can learn autonomously. ML, including both neural network-based approaches and mathematical statistics-based approaches, has become a driving force behind many recent breakthroughs in artificial intelligence, and is used in widely different areas like speech recognition, image analysis, natural language understanding, machine translation, question and answering systems, protein folding, and even playing GO.
The Ferocious Complexity Of The Cell
Fifty years ago, the first molecular dynamics papers allowed scientists to exhaustively simulate systems with a few dozen atoms for picoseconds. Today, due to tremendous gains in computational capability from Moore's law, and due to significant gains in algorithmic sophisticiation from fifty years of research, modern scientists can simulate systems with hundreds of thousands of atoms for milliseconds at a time. Put another way, scientists today can study systems tens of thousands of times larger, for billion of times longer than they could fifty years go. The effective reach of physical simulation techniques has expanded handleable computational complexity ten-trillion fold. The scope of this achievement should not be underestimated; the advent of these techniques along with the maturation of deep-learning has permitted a host of start-ups (1, 2, 3, etc) to investigate diseases using tools that were hitherto unimaginable. The dramatic progress of computational methods suggests that one day scientists should be able to exhaustively understand complete human cells.